Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Random simulations of a datatable for efficiently mining reliable and non-redundant itemsets

Identifieur interne : 004917 ( Main/Exploration ); précédent : 004916; suivant : 004918

Random simulations of a datatable for efficiently mining reliable and non-redundant itemsets

Auteurs : Martine Cadot [France] ; Pascal Cuxac [France] ; Alain Lelu [France]

Source :

RBID : Hal:inria-00186100

Abstract

Our goal is twofold: 1) we want to mine the only statistically valid 2-itemsets out of a boolean datatable, 2) on this basis, we want to build the only higher-order non-redundant itemsets compared to their sub-itemsets. For the first task we have designed a randomization test (Tournebool) respectful of the structure of the data variables and independant from the specific distributions of the data. In our test set (959 texts and 8477 terms), this leads to a reduction from 126, 000 2-itemsets to 13, 000 significant ones, at the 99% confidence interval. For the second task, we have devised a hierarchical stepwise procedure (MIDOVA) for evaluating the residual amount of variation devoted to higher-order itemsets, yielding new possible positive or negative high-order relations. On our example, this leads to counts of 7,712 for 2-itemsets to 3 for 6-itemsets, and no higher-order ones, in a computationally efficient way.

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Random simulations of a datatable for efficiently mining reliable and non-redundant itemsets</title>
<author>
<name sortKey="Cadot, Martine" sort="Cadot, Martine" uniqKey="Cadot M" first="Martine" last="Cadot">Martine Cadot</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-2359" status="OLD">
<idno type="RNSR">200118295L</idno>
<orgName>Analysis, perception and recognition of speech</orgName>
<orgName type="acronym">PAROLE</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/equipes/parole</ref>
</desc>
<listRelation>
<relation active="#struct-160" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300291" type="indirect"></relation>
<relation active="#struct-300292" type="indirect"></relation>
<relation active="#struct-300293" type="indirect"></relation>
<relation active="#struct-2496" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-160" type="direct">
<org type="laboratory" xml:id="struct-160" status="OLD">
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<desc>
<address>
<addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300291" type="direct"></relation>
<relation active="#struct-300292" type="direct"></relation>
<relation active="#struct-300293" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300291" type="indirect">
<org type="institution" xml:id="struct-300291" status="OLD">
<orgName>Université Henri Poincaré - Nancy 1</orgName>
<orgName type="acronym">UHP</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>24-30 rue Lionnois, BP 60120, 54 003 NANCY cedex, France</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300292" type="indirect">
<org type="institution" xml:id="struct-300292" status="OLD">
<orgName>Université Nancy 2</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>91 avenue de la Libération, BP 454, 54001 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300293" type="indirect">
<org type="institution" xml:id="struct-300293" status="OLD">
<orgName>Institut National Polytechnique de Lorraine</orgName>
<orgName type="acronym">INPL</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-2496" type="direct">
<org type="laboratory" xml:id="struct-2496" status="OLD">
<orgName>INRIA Lorraine</orgName>
<desc>
<address>
<addrLine>615 rue du Jardin Botanique 54600 Villers-lès-Nancy</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/centre-de-recherche-inria/nancy-grand-est</ref>
</desc>
<listRelation>
<relation active="#struct-300009" type="direct"></relation>
</listRelation>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université Nancy 2</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Institut national polytechnique de Lorraine</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
</affiliation>
</author>
<author>
<name sortKey="Cuxac, Pascal" sort="Cuxac, Pascal" uniqKey="Cuxac P" first="Pascal" last="Cuxac">Pascal Cuxac</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-1814" status="VALID">
<orgName>Institut de l'information scientifique et technique</orgName>
<orgName type="acronym">INIST</orgName>
<desc>
<address>
<addrLine>2, Allée du Parc de Brabois CS 10310 F-54519 Vandoeuvre-lès-Nancy</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inist.fr</ref>
</desc>
<listRelation>
<relation name="UPS76" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="UPS76" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Lelu, Alain" sort="Lelu, Alain" uniqKey="Lelu A" first="Alain" last="Lelu">Alain Lelu</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-31478" status="OLD">
<idno type="IdRef">168612127</idno>
<idno type="ISNI">0000 0001 2193 4396</idno>
<idno type="RNSR">199613836L</idno>
<orgName>Laboratoire de Semio-Linguistique, Didactique et Informatique</orgName>
<orgName type="acronym">LASELDI</orgName>
<desc>
<address>
<addrLine>30 rue Mégevand 25030 Besançon cedex </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-fcomte.fr/pages/fr/ea-2281---laseldi-7966.html</ref>
</desc>
<listRelation>
<relation name="EA2281" active="#struct-242365" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA2281" active="#struct-242365" type="direct">
<org type="institution" xml:id="struct-242365" status="VALID">
<idno type="IdRef">026403188</idno>
<idno type="ISNI">0000 0001 2188 3779 </idno>
<orgName>Université de Franche-Comté</orgName>
<orgName type="acronym">UFC</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-fcomte.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city" wicri:auto="siege">Besançon</settlement>
<region type="region" nuts="2">Franche-Comté</region>
</placeName>
<orgName type="university">Université de Franche-Comté</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Bourgogne Franche-Comté</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:inria-00186100</idno>
<idno type="halId">inria-00186100</idno>
<idno type="halUri">https://hal.inria.fr/inria-00186100</idno>
<idno type="url">https://hal.inria.fr/inria-00186100</idno>
<date when="2007-05-29">2007-05-29</date>
<idno type="wicri:Area/Hal/Corpus">003F34</idno>
<idno type="wicri:Area/Hal/Curation">003F34</idno>
<idno type="wicri:Area/Hal/Checkpoint">003A62</idno>
<idno type="wicri:explorRef" wicri:stream="Hal" wicri:step="Checkpoint">003A62</idno>
<idno type="wicri:Area/Main/Merge">004A66</idno>
<idno type="wicri:Area/Main/Curation">004917</idno>
<idno type="wicri:Area/Main/Exploration">004917</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Random simulations of a datatable for efficiently mining reliable and non-redundant itemsets</title>
<author>
<name sortKey="Cadot, Martine" sort="Cadot, Martine" uniqKey="Cadot M" first="Martine" last="Cadot">Martine Cadot</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-2359" status="OLD">
<idno type="RNSR">200118295L</idno>
<orgName>Analysis, perception and recognition of speech</orgName>
<orgName type="acronym">PAROLE</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/equipes/parole</ref>
</desc>
<listRelation>
<relation active="#struct-160" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300291" type="indirect"></relation>
<relation active="#struct-300292" type="indirect"></relation>
<relation active="#struct-300293" type="indirect"></relation>
<relation active="#struct-2496" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-160" type="direct">
<org type="laboratory" xml:id="struct-160" status="OLD">
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<desc>
<address>
<addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300291" type="direct"></relation>
<relation active="#struct-300292" type="direct"></relation>
<relation active="#struct-300293" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300291" type="indirect">
<org type="institution" xml:id="struct-300291" status="OLD">
<orgName>Université Henri Poincaré - Nancy 1</orgName>
<orgName type="acronym">UHP</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>24-30 rue Lionnois, BP 60120, 54 003 NANCY cedex, France</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300292" type="indirect">
<org type="institution" xml:id="struct-300292" status="OLD">
<orgName>Université Nancy 2</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>91 avenue de la Libération, BP 454, 54001 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300293" type="indirect">
<org type="institution" xml:id="struct-300293" status="OLD">
<orgName>Institut National Polytechnique de Lorraine</orgName>
<orgName type="acronym">INPL</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-2496" type="direct">
<org type="laboratory" xml:id="struct-2496" status="OLD">
<orgName>INRIA Lorraine</orgName>
<desc>
<address>
<addrLine>615 rue du Jardin Botanique 54600 Villers-lès-Nancy</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/centre-de-recherche-inria/nancy-grand-est</ref>
</desc>
<listRelation>
<relation active="#struct-300009" type="direct"></relation>
</listRelation>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université Nancy 2</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Institut national polytechnique de Lorraine</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
</affiliation>
</author>
<author>
<name sortKey="Cuxac, Pascal" sort="Cuxac, Pascal" uniqKey="Cuxac P" first="Pascal" last="Cuxac">Pascal Cuxac</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-1814" status="VALID">
<orgName>Institut de l'information scientifique et technique</orgName>
<orgName type="acronym">INIST</orgName>
<desc>
<address>
<addrLine>2, Allée du Parc de Brabois CS 10310 F-54519 Vandoeuvre-lès-Nancy</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inist.fr</ref>
</desc>
<listRelation>
<relation name="UPS76" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="UPS76" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Lelu, Alain" sort="Lelu, Alain" uniqKey="Lelu A" first="Alain" last="Lelu">Alain Lelu</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-31478" status="OLD">
<idno type="IdRef">168612127</idno>
<idno type="ISNI">0000 0001 2193 4396</idno>
<idno type="RNSR">199613836L</idno>
<orgName>Laboratoire de Semio-Linguistique, Didactique et Informatique</orgName>
<orgName type="acronym">LASELDI</orgName>
<desc>
<address>
<addrLine>30 rue Mégevand 25030 Besançon cedex </addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-fcomte.fr/pages/fr/ea-2281---laseldi-7966.html</ref>
</desc>
<listRelation>
<relation name="EA2281" active="#struct-242365" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA2281" active="#struct-242365" type="direct">
<org type="institution" xml:id="struct-242365" status="VALID">
<idno type="IdRef">026403188</idno>
<idno type="ISNI">0000 0001 2188 3779 </idno>
<orgName>Université de Franche-Comté</orgName>
<orgName type="acronym">UFC</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-fcomte.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city" wicri:auto="siege">Besançon</settlement>
<region type="region" nuts="2">Franche-Comté</region>
</placeName>
<orgName type="university">Université de Franche-Comté</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Bourgogne Franche-Comté</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Our goal is twofold: 1) we want to mine the only statistically valid 2-itemsets out of a boolean datatable, 2) on this basis, we want to build the only higher-order non-redundant itemsets compared to their sub-itemsets. For the first task we have designed a randomization test (Tournebool) respectful of the structure of the data variables and independant from the specific distributions of the data. In our test set (959 texts and 8477 terms), this leads to a reduction from 126, 000 2-itemsets to 13, 000 significant ones, at the 99% confidence interval. For the second task, we have devised a hierarchical stepwise procedure (MIDOVA) for evaluating the residual amount of variation devoted to higher-order itemsets, yielding new possible positive or negative high-order relations. On our example, this leads to counts of 7,712 for 2-itemsets to 3 for 6-itemsets, and no higher-order ones, in a computationally efficient way.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Franche-Comté</li>
<li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement>
<li>Besançon</li>
<li>Nancy</li>
</settlement>
<orgName>
<li>Institut national polytechnique de Lorraine</li>
<li>Université Nancy 2</li>
<li>Université de Bourgogne Franche-Comté</li>
<li>Université de Franche-Comté</li>
<li>Université de Lorraine</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Grand Est">
<name sortKey="Cadot, Martine" sort="Cadot, Martine" uniqKey="Cadot M" first="Martine" last="Cadot">Martine Cadot</name>
</region>
<name sortKey="Cuxac, Pascal" sort="Cuxac, Pascal" uniqKey="Cuxac P" first="Pascal" last="Cuxac">Pascal Cuxac</name>
<name sortKey="Lelu, Alain" sort="Lelu, Alain" uniqKey="Lelu A" first="Alain" last="Lelu">Alain Lelu</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 004917 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 004917 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Hal:inria-00186100
   |texte=   Random simulations of a datatable for efficiently mining reliable and non-redundant itemsets
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022